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Abstract — An extensive body of research deals with estimating 
the correlation and the Hurst parameter of Internet traffic traces. 
The significance of these statistics is due to their fundamental 
impact on network performance. The coverage of Internet traffic 
traces is, however, limited since acquiring such traces is chal- 
lenging with respect to, e.g., confidentiality, logging speed, and 
storage capacity. In this work, we investigate how the correlation 
of Internet traffic can be reliably estimated from random traffic 
samples. These samples are observed either by passive monitoring 
within the network, or otherwise by active packet probes at end 
systems. We analyze random sampling processes with different 
inter-sample distributions and show how to obtain asymptotically 
unbiased estimates from these samples. We quantify the inherent 
limitations that are due to limited observations and explore the 
influence of various parameters, such as sampling intensity, net- 
work utilization, or Hurst parameter on the estimation accuracy. 
We design an active probing method which enables simple and 
lightweight traffic sampling without support from the network. 
We verify our approach in a controlled network environment and 
present comprehensive Internet measurements. We find that the 
correlation exhibits properties such as long range dependence 
as well as periodicities and that it differs significantly across 
Internet paths and observation times. 

I. Introduction 

Traffic characteristics play a key role in planning and opera- 
tion of packet data networks. As a consequence, in recent years 
network measurements have attracted considerable attention as 
a practical method for inferring traffic properties. The scope of 
such measurements varies from access networks to backbone 
networks or even across the Internet. 

Numerous comprehensive measurement studies, based on 
recorded network traces, have revealed that aggregate Internet 
traffic possesses long memory correlations, so-called long 
range dependence (LRD) (§, (12), (19), (30). The impact 
of LRD on network performance was investigated in several 
works, e.g., (10), (TTJ, (20), (25), (26), (2§, (34), (35). 
Networks fed with LRD traffic exhibit a fundamentally dif- 
ferent behavior compared to systems fed with memoryless or 
Markovian traffic. 

In practice continuous logging and evaluation of all relevant 
network events in large networks is typically not feasible due 
to efficiency, confidentiality, and cost factors. For example, 
with link speeds of 10 Gbps and more capturing traffic 
traces becomes increasingly difficult, as suitably large and 
fast storage systems are expensive. One main challenge is 
therefore, to extract the desired information from a subset of 
events, e.g., using a sampling procedure that yields consistent 
estimates of the target metric. In addition, ISPs rarely disclose 



traffic traces because of confidentiality issues such that traffic 
characteristics can only be inferred from external observations. 
Further, a fundamental limitation of traffic traces is that these 
reflect traffic characteristics at only a single observation point. 

In this work, we investigate the problem of estimating the 
correlation of Internet traffic given a limited set of random 
samples. First, we consider passive sampling, i.e., capturing 
traffic samples at some directly accessible node, e.g., a router. 
Here, the main focus is on the choice of the sampling process 
and it's properties. Further, for any practical realization passive 
sampling yields a finite sample size, which directly influences 
the accuracy of the results. Secondly, we consider active 
probing that is a technique, where external measurements 
of specific probe packets are used. The aim is to avoid 
any particular network support by exploiting, e.g., timing 
information that is imprinted on the probes by interaction with 
network traffic. The additional challenge of active compared 
to passive methods is to design probes that actually permit 
inferring the desired traffic characteristics, which in certain 
cases may even be impossible (24) . 

The ultimate result of this work is to enable the online 
estimation of traffic correlations along network paths without 
network support. To this end, we present methods for extract- 
ing LRD characteristics from sampled traffic. We derive the 
impact of sampling on the observed traffic correlations for dif- 
ferent sampling strategies and show that sampling may distort 
observations. We develop methods that reverse these effects 
for a set of sampling processes. We quantify the accuracy 
of the observations under finite sampling durations, showing 
that the estimation error increases as t 2 ~ 2H with the autoco- 
variance lag r and the LRD Hurst parameter H € (0.5,1). 
We derive the impact of different sampling parameters on 
estimation accuracy and show a non-linear trade-off between 
sampling intensity and sampling duration. Finally, we design 
and evaluate a practical active probing method to estimate 
traffic correlations from external observations. We present 
practical testbed and Internet measurement results showing a 
complex covariance structure of Internet traffic that exhibits 
LRD as well as periodic behavior. 

The paper is structured as follows: In the next section we 
present the state-of-the-art on LRD network traffic charac- 
teristics, sampling and active network probing. In Sect, [ill] 
we derive our main results concerning traffic sampling and 
the accuracy of the estimated traffic parameters. In Sect. [V] 
we present and deploy an active probing method that uses 



packet probes to infer traffic correlations. Sect. VI concludes 
the paper. 

II. Related Work 

In the following, we discuss related work on LRD traffic 
characteristics, sampling and network probing. 

A. LRD traffic characteristics 

Comprehensive measurements in the 90s, e.g., (8), (12) , 
[ [19) , 1 30 1 revealed that aggregate Internet traffic exhibits 
LRD and self-similarity phenomena, that can be described 
by the so-called Hurst parameter H. A self-similar stochastic 
process possesses the same finite dimensional distributions 
on different time scales except for a rescaling factor which 
depends on H. The aggregation of multiple traffic sources 
offers a possible explanation of these characteristics. It was 
shown in |40| that aggregating many on-off sources with heavy 
tailed on and off periods yields self-similar LRD traffic. This 
notion corresponds to file transfers from heavy tailed file size 
distributions as observed on storage systems [|8|, [45 1. An 
experimental validation of the relation between self-similarity 
and heavy-tailed distributions is carried out in (23) on a large- 
scale experimental facility. 

Given a stationary process Y(t), LRD manifests itself in 
the slow decay of the autocovarianc^] cy (r) such that 
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where a\ is the variance of Y(Q) and the Hurst parameter 
H £ (0.5,1). The sum of the autocovariance over all lags r 
diverges, i.e., °y{ t ) co- 
in this work, we focus on the autocovariance structure of ([TJ. 
Our goal is to infer Q from traffic observations, respectively, 
to estimate the Hurst parameter H from from the slope of 
cy(r) on a log-log scale. Numerous other methods exist for 
estimating the Hurst parameter from LRD and self-similar time 
series g), (39), (42). 

In addition to ([TJ, we consider two established methods for 
estimating the Hurst parameter H [4], |39). First, we consider 
the aggregate variance method, that relies on the convergence 
rate of the sample mean of an LRD process to the true mean. 
Given samples of Y(t) of size M, the variance of the sample 
mean decays as ~ M 2H ~ 2 with growing M. 

The second method denoted power spectral density method 
relies on the behavior of the spectral density of the LRD 
process Y(t). The spectral density of Y(t) is well 

known (4). It can be approximated as 



*y(/) ~ \f\~ 2H for/^0. 
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The Hurst parameter H can be estimated from the slope of 
*S>y(f) plotted against the frequency / on a log-log scale. 

1 Throughout this work, we use the definition of autocovariance in the signal 
processing sense, i.e., for a stationary process Y(t) the autocovariance is 
defined as cy(r) := E[Y(t)Y(t + r)[ - E[Y(t)]E[Y(t + r)]. For brevity, 
we frequently use the term covariance to mean autocovariance. 



B. Sampling 

Sampling is widely used to reduce the data processing 
and storage requirements as well as to circumvent problems, 
such as system inaccessibility and hardware access latency. A 
fundamental result often employed in the sampling context is 
known as PASTA, Poisson Arrivals see Time Averages |46|. 
PASTA states that the portion of Poisson arrivals that see a 
system in a certain state corresponds, in average, to the portion 
of time the system spends in that state. 

Further, the authors of [27] establish general conditions, 
such that Arrivals See Time Averages (ASTA) holds, i.e., bias 
free estimates are not limited to Poisson sampling. In a recent 
work the authors of [3| coined the term NIMASTA, i.e. Non- 
intrusive Mixing Arrivals See Time Averages, in the context of 
network measurements. Using an argument on joint ergodicity, 
the authors prove an almost sure convergence of 
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where Y(9i) is a sample of the process Y(t) at time 0* and 
g is a general positive function of Y. The sampling times f?j 
for i € N are chosen according to a sampling process. The 
target metric is specified depending on the chosen function g. 
Eq. ([3]) is satisfied when the process Y(t) is ergodic and 
the sampling process is mixing |3j. The authors in [2] show 
that Poisson sampling, though bias free, does not guarantee 
minimum variance estimates. 

A comparison of Poisson and periodic sampling was carried 
out in (37), pT[ . In pT] the authors show experimentally, that 
the differences between round trip times (RTT), loss rate and 
packet pair dispersion estimates, obtained by either Poisson or 
periodic probing, are in some cases not significant. Depending 
on the autocovariance of the sampled process, Poisson or 
periodic sampling can be superior. This is shown in [ 37] using 
the metric asymptotic variance. 



In 1 3 1 1 it is shown that for correlation lags tending to 



infinity, random sampling captures the long memory of the 
original processes, as long as the sampling distribution has a 
finite mean. 

C. Active network probing 

The injection of test packets into a network for inferring 
network performance, i.e., active probing, has attracted con- 
siderable attention in recent years. End-to-end packet delays 
or inter packet times are metrics commonly used to estimate 
network characteristics such as the average available band- 
width or even to reconstruct statistics of the cross-traffic fTT) , 
(18) , (33) , (38) . Under the assumption of FIFO scheduling, 
cross traffic intensity can be estimated from the dispersion of 



back-to-back probing packets (9), (2TJ, (22|, (38 1. 

Cross traffic estimation of LRD traffic using active mea- 
surements was discussed, in (16) , (32). The authors of (16) 
carry out a numerical simulation to interpolate cross traffic 
from probes and predict future traffic from the LRD property. 
In (32) the authors derive and show simulation results for a 



deterministic probing scheme based on a multi-fractal wavelet 
traffic model. Essential to their estimation is the assumption 
that the queue does not empty between the the individual 
packets of a packet probe. Our work differs from fl6) , (32) as 
we examine different random sampling distributions and show 
how to extract traffic correlations from distorted observations. 

Two important aspects concerning network probing are the 
measurement intrusiveness and the interaction of probes with 
the measured system. The first aspect is usually addressed by 
minimizing the probing rate while controlling the quality of 
the results. The second aspect is more involved, since the 
probes perturb the system leading to distorted observations. 
For example, measuring queueing delays of probes to de- 
termine the true queue length distribution is governed by a 
type of Heisenberg uncertainty | |36[ , since the probes alter the 
queue length. The authors describe the impact of the probing 
intensity on the accuracy of the result using the notion of 
asymptotic variance. The effect is increased in case of LRD 
traffic, although not given in closed form, leading to higher 
uncertainty in the estimated waiting time p6) . 

III. Traffic sampling and parameter estimation 

In this section we derive our main results on traffic co- 
variance estimation from sampled observations. Based on 
sampling properties we present rigorous traffic parameter 
estimation. Subsequently, we investigate the accuracy of the 
estimates under the practical constraint of finite sample sizes. 

A. Covariance of sampled processes 

We define a sampling model comprising of three stationary 
discrete time processes: a traffic increment process Y(t), a 
sampling process A(t), and an observed process W(t) for 
t G No- We assume statistical independence of A(t) and Y(t). 
Our focus lies on the estimation of the covariance of Y (i) that 
is characterized by LRD. While the LRD process may be in 
continuous time, we regard its increments on a fixed time slot 
basis, and hence the discretization of Y(t). 

The sampling process A(t) is a point process taking the 
value of one whenever a sample is taken, and zero otherwise, 
i.e., A(t) is a Kronecker delta train, where a Kronecker delta 
is defined as S(n) — 1 for n = and zero otherwise. The 
process has independent and identically distributed (iid) inter- 
sample times drawn from a given probability distribution F. 
The inter-sample time is the time between two consecutive 
Kronecker deltas. The sampling intensity, i.e., the mean rate 
of the sampling process of A(t), is E [A(t)] = ha for all t, 
with < HA < L Throughout this work we use H(-) to denote 
the expected value E [(•)]. 

We base our analysis on the observed stochastic process 
W(t), generated by random samples A{t) of the increment 
process Y(t), with 

W(t) = A{t)Y{t). (4) 

We aim to infer properties of the traffic process Y(t) from 
the observation process W(t). In particular, we are interested 
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Fig. 1: Autocovariance of LRD traffic processes under geo- 
metric sampling. The observed "cvi/(t)" maintains the auto- 
covariance structure of the traffic process. The covariance of 
the original process "cy(r) (traffic)" is exactly covered by the 
reconstructed "cy(r) (estimate)". 

in sampling distributions F that deliver accurate estimates 
of the correlations of the LRD traffic process Y(t) and the 
associated Hurst parameter H. Extracting the autocovariance 
of the process Y(t), i.e., cy(r) from the observed cw(t) 
is generally not a straightforward task. The following lemma 
reveals the impact of sampling on the autocovariance of the 
observed process. The proof of Lem. [T] is a variation of 
standard technique in stochastics. 

Lemma 1: Given the stationary and independent stochastic 
processes A(t) and Y(t) and let W(t) = A(t)Y(t). The 
covariance of W(t) can be decomposed into 

cw(t) = (c a (t) + fJ, A ) c y (t) + c a (t)hy- 

Proof: Given independent and stationary processes A(t) 
and Y(t). Let W(t) = A{t)Y(t). It follows that 

c w (t) = E [A(t)A(t + r)] E [Y(t)Y(t + r)] - hWy 

= {ca(t) + Ha) ( c y(t) + Hy) ~ Ma/4 

= ( c a{t) + Ha) c y{t) + c a (t)h y 

where C(.)(r) denotes the covariance of process (•) at lag r. 

■ 

Lem. [T] clearly shows the impact of the sampling process on 
the observed covariance. In particular, the choice of the inter- 
sample distribution influences cw(i~) through ha and ca(t), 
i.e., both the sampling intensity and the sampling covariance 
influence the observation. 

In this work, we investigate four inter-sample distributions: 
geometric (memoryless), periodic, Gamma, and uniform. For 
each distribution we show how to recover the covariance of 
the LRD process cy(r) from the observed cw(t) using the 
covariance ca(t). To this end, we derive the covariance of the 
sampling process ca{t) = E [A(t)A(t + r)] — h\- We use the 



TABLE I: Parametrization of sampling distributions and traffic parameter estimation 
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probability mass function /(r) of the inter sample times to cal- 
culate the n-fold self-convolution /^*™^(r). We then calculate 
the autocorrelation E [A(t)A{t + r)] = \x A Y.n=\ / ( *™ } ( T ) as 
given in {7), Eq. (4.6.1). We exploit the property that 
is a power series for the considered distributions and that its 
sum converges. The derivation of the autocorrelations used in 
Tab. [j] is given in appendi x |VII-A| to |VII-D| In the last step 
we insert ca(t) into Lem. Ill and solve for cy(r). 

Tab. [I] summarizes the expressions used to reconstruct 
cy (t) given specific inter-sample distribution parameters and 
corresponding ca (t). First, we consider the geometric inter- 
sample distribution, i.e., a Bernoulli sampling process. The 
independence of the increments implies that ca(t) — for 
r > 0. From Lem. [T] the observations W(t) have autocovari- 



c w (t) = cy{t)h\. 



(5) 



This shows that sampling processes with uncorrelated incre- 
ments only shift the autocovariance structure of the sampled 
process Y(t) by [i A . 

Next, we consider periodic sampling, where A(t) is mod- 
eled as a comb of Kronecker deltas with sampling period A. 
The mean intensity of the sampling process is ma = We 
can recover cy(r) using Lem. [I] however, only at r = fcA 
where k € N. To perform this inference, the mean rate [iy 
of the traffic process Y(i) must be known. Due to the rigid 
structure of periodic sampling it is, however, shown that the 



mean rate estimator mw/ma is not unbiased [3], e.g., the 
sampling period may coincide with periodicities in the original 
process. 

Finally, Tab. [j] provides expressions for reconstructing 
cy(r) after Gamma and uniform sampling. For mathematical 
tractability, here we use continuous time for the derivation of 
the autocorrelation of A(t). For discretization we use a time 
slot of unit size. Note that the discretization error diminishes 
for autocorrelation lags much larger than the discretization 
time slot. In case of Gamma sampling, the ability to estimate 
Cy(r) is not limited to the exemplary a = 2 given in 
Tab. Lem. [T] can be used to estimate cy(r) for Gamma 
sampling processes with arbitrary parameters as long as the 
autocovariance ca{t) is computable. We provide results for 
Gamma sampling with a = 4 in appendix VII-C For uniform 



sampling with support b Tab. [I] gives a result for lags t < b. 
Due to the finite support, ca{t) quickly approaches zero for 
t > b. Like in case of periodic sampling, the reconstruction 
of cy(r) from Gamma and uniform sampling, respectively, 
requires knowledge of /iy- 

Figures [T] and [2] illustrate autocovariance estimates de- 
rived from observations W(t), that are obtained by sampling 
LPsD traffic with autocovariance cy(r) ~ a Y T 2H ~ 2 and 
H € [0.6, 0.9] j^] We use geometric, periodic, Gamma, and 

2 Synthetic traces of length 2.5 X 10 8 time slots were used for the simulation 
which was repeated 25 times for each considered H. 




(a) Periodic sampling (b) Gamma sampling (c) Uniform sampling 

Fig. 2: Autocovariance of the LRD process under different sampling strategies. Note that "cy(r) (traffic)" is covered by the 
"cy(r) (estimate)". 



uniform inter-sample time distributions and set n A = Q.l. In all 
cases the reconstructed autoco variance denoted "cy (estimate)" 
exactly covers the original traffic autocovariance "cy (traffic)". 

Geometric sampling in Fig.[T]preserves the linear covariance 
structure of cy(r). The observed Cvp(t) is vertically shifted 
by log(fi A ) w.r.t. the original cy(r). The Hurst parameter H 
can be inferred directly from the slope of cw(t). 

For the remaining distributions shown in Fig. [2] the ob- 
servations cw(t) are distorted. However, using Lem. [T] we 
recover the original covariance cy(r). Using the expressions 
from Tab. |I]we reconstruct "cy (estimate)" which lies on top 
of the original autocovariance "cy (traffic)". 

In the following we discuss advantages and disadvantages 
of the presented sampling distributions. Periodic and uniform 
sampling are practically convenient as the inter-sample times 
cannot become arbitrarily large due to the finite support of the 
inter-sample distribution. Moreover, periodic sampling is easy 
to implement. 

However, it is important to point out that periodic sampling 
yields misleading results if the sampling period coincides 
with periodicities in the target process. In addition, periodic, 
Gamma as well as uniform sampling require a reconstruction 
step to estimate the covariance cy(r) from observations as 
shown above. To this end, an estimate of /iy is required. 

Memoryless sampling is proposed by the IETF as a network 
probing scheme ]29) . We find that a major advantage of 
geometric sampling, i.e., memoryless, is that the covariance 
structure of cy (t) is preserved in the observations as given in 
d5J. This stands in contrast to periodic, Gamma and uniform 
sampling. In the following we continue the analysis with 
geometric sampling because of its advantages discussed above. 

B. Impact of finite sample sizes 

Next, we examine the accuracy of the derived estimates for 
finite sample sizes as this is important for any practical real- 
ization. We determine the impact of sampling parameters, e.g., 
sampling duration or intensity, on the observations. Moreover, 
we evaluate the accuracy of the deployed statistical estimators. 



Finally, we recover the results from Sect. III-A in the limit for 
infinite sampling durations. 

We investigate sample autocovariances marked by C(.) as 
estimators of the population autocovariances cr.y In addition, 
we consider the sample means /iq as estimators of the 
population means fir.y To better understand the impact of finite 
sample sizes on the observations and the covariance estimates 
we examine the individual effects of the sample covariances 
involved in a step by step manner. 

While geometric sampling is appealing since it's autoco- 
variance ca (t) = for r > 0, it looses this property for finite 
sampling duration T, where T is the length of the time-slotted 
sampling process A(t) in slots. 

In the following we focus on three aspects. First in subsec- 
tion |III-B1| we derive the impact of finite sample sizes on the 
observability of the covariance of sampled traffic. The second 
aspect is the impact of the sample covariance ca(t) and its 
influence on the estimation error. This is handled in subsection 
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(b) Schematic description 



(a) Sampling 

Fig. 3: Noisy observations due to finite sampling. Noise floor 



(shaded area) in Sect. 



III-B1 



Noise cone in Sect. 



III-B2 



III-B2 The third aspect is the impact of finite sample sizes 
on the bias of the covariance estimators given in subsection 
IIII-B3I 

1) Observation limit: In this section, we consider obser- 
vations from finite sampling. At first, we do not consider 
deviations of sample statistics from respective population mea- 
sures. This assumption is relaxed in the following subsections. 
We investigate the limit up to which the covariance of the 
observed LRD process cw(r) can be distinguished from the 
covariance of iid sequences of the same sample size. Obeying 
this limit ensures that the variability that is due to the sample 
size does not mask the covariance that we seek to observe. 
Exemplary, we depict in Fig. |3(a)| the sample autocovariance 
cy(r) of an LRD traffic trace Y(t), and the corresponding 
autocovariance of geometrically sampled observations c<w(t), 
both with a limited sample size T. Evidently, cw(t) is not just 
a shifted version of cy(r) but distorted for increasing lags r 
by observation "noise" that stems from the variability of the 
limited sample size. 

We seek a range of lags r g [0, r*] in which the covariance 
of the sampled process can be observed without significant 
distortion. Based on a standard technique H) we compare 
the covariance of the observed process to the covariance of 
geometrically sampled iid Gaussian sequences to obtain r* 
up to which both covariances are significantly different. 

We define r* as the intersection of c\y{i~) from |5) and 
the 0.95 confidence interval for geometrically sampled finite 
Gaussian iid sequences with mean /iy and variance a Y - For 
T 3> r we find that this confidence interval is given by 
2y/(a 2 A n Y + ha(?y) 2 + ^ 2 a \i 2 y (<j 2 a ii 2 y + fi A a Y )/VT. The 
calculation relies on the central limit theorem and is given in 



detail in appendix VII-E Fig. 3(b) depicts r* as well as the 
confidence interval, denoted as noise floor, schematically. 

To calculate r* for LRD traffic with covariance cy(r) = 
K<j y t 2H ~ 2 , with constant K, we equate the above confidence 
interval with c^(r) = cy(r)/4 from |5) to obtain 
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It is obvious that stronger LRD, i.e., higher H, is observed 



better. Clearly, for an infinite sample size T —> oo, the 
observable range goes to infinity r* — > oo. Fig. 3(a) shows 



that in practice it is important to consider this range to ensure 
that the results are not strongly distorted. 

2) Estimation accuracy: Next, we evaluate the impact of 
the finite sample size on the sample covariance ca(t). We 
analyze the influence of ca(t) on the observation cvv(t) and 
of estimates of cy (r) obtained thereof. For ease of exposition, 
we assume cy(r) = cy(r), Ji& — /j,a and jly = £ty, i.e., 
in this subsection we restrict our analysis to the deviation of 
c A (j) from c A (r). 

We assume T ^> r and use the central limit theorem 
to approximate the distribution of the sample autocovariance 
ca{t) by a Gaussian distribution with standard deviation 
a A\Z&A + ^^A /\/T — t. From the Bernoulli sampling pro- 
cess A(t) we know that a\ = /j,a — n\. We calculate the 
0.95 confidence interval erf 5 « ±2(7^ + - t 

for the mean sample autocovarianc^] The derivation can be 



found in appendix VII-F 

With help of cyf 5 we investigate the impact of the varia- 
tions of Ca{t) on the observation cw(t). First, we use erf 5 
to calculate a confidence interval for cyp(r) from Lem. [T| 
as c^y (r) = ±crf 5 (cy(r) + /J^/j . We schematically depict 
c w( T ) as nc, i se cone in Fig. 3(b) 

Next, in reference to (J5J we consider the estimator 
Cw{t)/^a f° r estimating cy(r). We analyze the impact of 
the variations of ca(t) on this estimator. We calculate the 
confidence interval 
±cf (cy(r) 
relative error 



/40 



Cy (r) for this estimator as (r) = 
/^a- Finally, we obtain the following 



£y 



(r) 



\c y 5 (t)\ ^ 2a A Va 2 A+^ 
cy(r) 



(6) 



Cy(r) 

From |6]) we observe that the estimation error introduced 
through ca(t) decays with increasing sampling duration T or 
with increasing sampling intensity \xa- For small (practical) 
sampling intensities, e.g., ha < 0.1, we find a nonlinear trade- 
off between sample intensity ha and sampling duration T. 
Using o\ = Ha — Ha from the Bernoulli sampling process 
the prefactor in (|6]l can be approximated as 1/^/T/ia for 
r>r. This result enables the important conclusion that for 
finite sample sizes sampling intensity has a stronger impact 
on accuracy than sampling duration. 

Next, we examine the influence of the parameter H on ([6]) 
for large lags r. For increasing r, cy (r) decreases, such that 
when cy(r) <C fiy> tne relative estimation error (|6]l becomes 



\cy{t) 
2cr A ^/a A ~+4^n Y 2 _ 2H 

T 



2 2 
TH A <Ty 

2 ^2H-2 



(7) 



where we substituted cy(r) = o y t . The relative esti- 
mation error e^ij) increases with the lag r depending on 

3 We use the relation Ri to denote the approximation, here due to the central 
limit theorem. 
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(a) H = 0.6 (b) H = 0.8 

Fig. 4: Estimation error under finite sampling depends on H. 



H e (0.5, 1). For LRD traffic which exhibits large H, the 
estimation error increases slower in t compared to traffic with 
a small parameter H. 

We depict £y'(Y) in Fig. |4] To this end, we used 100 
generated LRD traffic traces with T = 2 x 10 s time slots. 
The figure includes auxiliary lines with a slope of 2 — 2H. It 
is evident, that the estimation error evolves with r as given 

by 0. 

In addition, we calculate the needed sampling duration T to 
achieve constant e r y 1 for a given lag t, and fixed fiA, Hy and 
ay. We find from |7]) that the sampling duration has to increase 



as T ~ max{r 



4-4H 



t}, which again reveals the impact of 



H. Specifically, for H < 0.75 the sampling duration has to 
increase faster than linearly with r to achieve constant e Y . 

3) Bias of autocovariance estimators: Next we inves- 
tigate the accuracy of the deployed statistical estimators. 
The impact of the finite sample size carries forward to 
the computation of the autocovariance of Y(t). First we 
consider the case where we directly observe Y(t) for a 
finite duration T, We consider the autocovariance estima- 
tor Cy(r) 



T=7 Et=l T (VP) - MYo) (V(t + T) - fLY T ) with 

Y^t=i V(t + A n estimator of the autocovari- 



ance is unbiased iff E [cy(r)] = cy(r). To inspect the bias of 
Cy(t), we calculate its expected value and find 



E [cy(r)] fa cy(r) - 



(T- 



\2-2H ' 



(8) 



The derivation of <|8j> is given in appendix VII-G 

From <|8j we conclude that the autocovariance estimator 
cy(r) is asymptotically unbiased for T —> oo and T 3> r. The 
maximum lag, up to which the autocovariance is estimated, 
must be chosen carefully, such that the bias in <|8j becomes 
negligible. However, the bias depends on H such that higher 
H require larger T. 

After considering the entire process Y (t) we now investigate 
the bias of the autocovariance estimator when applied to W(t) 
as observed by sampling with finite duration T. We calculate 
the expected value of the estimated autocovariance 

cw(0) 



E[c w (t)\ 



cvk(t) 



(T-t) 



T-t 

T-r-1 



t)cwW- (9) 
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Fig. 5: Aggregate variance estimate under geometric sampling. 
The variance Var(y( M )) of the traffic process Y(t) is covered 
by the estimate obtained from ( fTTj ) using the variance of the 
observations Var(T4^ M )) respectively the sampling process 
Var(A( M )). 



The derivation of |9} is given in appendix VII-H The bias in 
(|9]l goes to zero for T — > oo and T 3> t. 

In the remainder of this section we provide brief conclusions 
that highlight our main findings. We presented a framework for 
extracting the traffic autocovariance from observed samples. 
From our evaluation of the sampling distributions we conclude, 
that the covariance observed under geometric sampling does 
not exhibit any distortions. This property greatly simplifies the 
reconstruction of the covariance of the original process Y(t), 
as no additional parameters, such as fly, must be estimated. 
Hence, for geometric sampling with sufficiently large T we 
use cw{t)/h\ as an estimator of the traffic autocovariance. 
From the evaluation of the estimator we find two major aspects 
that limit the observability for finite sampling sizes. First, 
finite sampling sizes yield computable distortions given in 



Sect. III-B1 and III-B2 which may obscure the true covariance 



structure. Secondly, the bias for covariance estimators depends 
on the Hurst parameter, such that longer measurements must 
be conducted for traffic exhibiting strong LRD. 

Nevertheless, finite sampling effects disappear in the limit 
for large sampling durations. Moreover, we found that increas- 
ing the probing intensity improves estimation results more 
quickly than increasing the sampling duration. 

IV. Impact of sampling on selected H estimators 

In this section we analyze the impact of sampling on two 
established Hurst parameter estimation techniques, i.e., the 
aggregate variance, respectively, the spectral density method. 

A. Aggregate variance 

Briefly, the method exploits the fact that the variance of an 
LRD, self-similar process considered at different aggregation 
time scales decays linearly in H on a log-log scale. As outlined 



M, denoted aggregation level, and average within each block. 
The aggregate time series on the aggregation level M of Y(i) 
is obtained as 

kM 

Y^ M \k) = — Y ( t ) for keN - ( 10 ) 

t=l+(fc-l)Af 

The variance of the sample means is known to decay with 
the block size as M 2H ~ 2 . The Hurst parameter H is obtained 
from the corresponding slope on a log-log scale. 

The following lemma shows the impact of the sampling 
process on the aggregate variance of the observed process 
W(t). 

Lemma 2: Given Q and the aggregation rule (JTOj. For the 
aggregate processes AS M \ W^ M \ and y( M ) it holds that 

Var(W^ M >) = ^Var(A( M )) + / £Var(y< M >) 



1 

'M 



>yU A 



2 

M2 



M-l 

E 

T=l 



(M -t)c y {t)c a (t). 



The proof of Lem. [2] is given in the appendix VII-I Lem. 
|2] illustrates the relation between the variances of the three 
aggregate processes A^ M \ W^ M > and Y^ M \ From Lem. |2j 
we see the impact of sampling on the observed Var(W^ M ') 
through ca{t) and the parameters of A(t), i.e., ha and o\. 

Again the advantage of geometric sampling is apparent as 
c a(t) = allows solving for Var(y( M ') directly using the 
variances of the sampling process Vai(A t - M ^), respectively, of 
the observed process Var(W( M )) as 

* Var(^ M >) 
- -1 (&Y*r(AM) 



Var(y( M )) = 



1 

M 



~ 2 ~ 2 

a Y a A 



(11) 



The estimate Var(F' M )) in ( fTT) requires estimates of the 
mean and variance of the traffic process. These estimates 
can be obtained from W(t) — A(t)Y(t) as /j,y = Hw/ha 
respectively a Y = {<J\y — &aP>y)J Va where we used that 
a \ + vtx = V" A f° r ^ e B ernou Hi sampling process A(t). 



Fig. |5j depicts the decay of Var(I^( M )) as a function of M 
for geometrically sampled observations. We consider the same 
scenario and parameters as for Fig. [T] in Sect. |III-A Note that 
Var(y( M )) of the original process is covered by the estimated 
aggregate variance using {fl}, denoted "Var(T( M )) estimate . 
A Hurst parameter estimate is deduced from the slope of 
"Var(y( M )) estimate", i.e., 2H — 2. The correct slope for the 
evaluated H = 0.8 is indicated in the figure by the dashed 
auxiliary lines. 

For the remaining samplingdistributions discussed in 

^for Var(Y (M) ) is not easily 



Sect. 



III-A 



the inversion of Lem. 
possible as c^(r) 7^ such that the term that contains cy(r) 
persists. 

In general, as the block size M increases, the impact of the 
terms in the second line in Lem. [2] diminishes, however. For 
M — > 00 the relationship in Lem. [2] tends to 

Var(Vt^ M >) w MyVar(A( M )) +^Var(yW). 



In particular, for M — > oo the observed Wai(W^ M ^) tends to 
/^Var(Y"( M '). This is due to the fact that sampling processes 
A(t) considered here are not LRD. Hence, Var(A( M ') decays 
with slope —1 with M on a log-log scale, whereas Var(Y"( M ') 
decays with 2H — 2. This effect is visible in Fig. [5] as 
Var(W^ M >) tends for increasing M to the auxiliary dashed 
line of slope 2H — 2. 

B. Spectral density 

Spectrum based Hurst parameter estimators rely on the 
characteristics of the frequency domain representation of LRD 
processes. The spectral density of an LRD process Y(t) 
possesses the behavior given in |2]l |4|, |39|. Hence, H can be 
estimated from the logarithm of the spectral density of Y(t) 
plotted vs. log(/). 

We rephrase Lem. [TJ for the spectral density to find 



*w(/) = *a(/)**v(/) 



(12) 



with denoting the spectral density of the process (•). 

We use * to denote the convolution defined as x * y(t) := 
J_ oo x(r)y(t — r)dr. To prove ( fT2| ) we use the Wiener- 
Khinchin theorem, which states that the spectral density 
is given by the Fourier transform of the autocorrelation 
function. The autocorrelation of W(t) = A(t)Y(t), i.e., 
E [W(t)W(i + r)], is obtained as the product of the auto- 
correlation functions of A(t) and Y(t). Finally, the Fourier 
transform of the product of two functions T {x ■ y} (/) is 
given by the convolution of the respective Fourier transforms 
F{x}*F{y}(f). 

For ease of exposition we consider continuous time mem- 
oryless sampling, i.e., inter-sample times drawn from an 
exponential distribution with parameter A. The spectral density 
of A(t) is given, e.g., in @ as = A 2 <S(/) + A with 

the well known Dirac delta function 6(f) that is defined as 
^ oo g(f)6(f)df = g(0) fiil. From ([3} we calculate the 
spectral density of the observed process W(t) as 



*w(f) = *a(/)*<M/) 

= (\ 2 6(f) + \)*y Y (f) 
= \Hy(f)+X(a Y + ^) 



(13) 



The convolution X*^y(f) reduces to A(ery+/iy) using the 
Wiener-Khinchin theorem as A *y(/)d/ = AE [Y(t) 2 ]. 
Consequently, the spectral density of Y(t) can be estimated 
by solving (13i for \&y(/). The estimate ^y(f) requires 
estimates of the mean and variance of the traffic process, i.e.. 



/iy and a Y respectively, that can be obtained as in Sect. IV-A 
An estimate of H is obtained from the slope of ^y(f) on a 
log-log scale. 

Next, we consider periodic sampling in conjunction with the 
spectral density method. For periodic sampling it is known that 
the sampling frequency has to be twice the highest frequency 
contained in the sampled process to avoid aliasing. This is 
known as Nyquist criterion. For periodic sampling ^A(f) is a 
dirac comb with inter-dirac distances of 1/A where A is the 
sampling period. This leads to a repetition of the spectrum 



tyy(f) at distances 1/A. An irreversible spectral overlap 
occurs if the Nyquist criterion is not met, which is the case 
here as 4 r y(/) is not band limited. As a result the method 
of spectral estimation of the Hurst parameter cannot be used 
directly with periodic sampling. Also, the remaining sampling 



strategies from Sect. III-A may not yield an expression for 
^w(f) in (fl3]l that can be solved for ^y(f)- 



As in Sect. [TIT] we find that geometric sampling provides a 
substantial advantage when deploying established H estima- 
tors to sampled observations. For the considered methods, ( fTTj ) 
and ( fT3] > provide the measures needed for H estimation. 

V. Active probing 

So far, we focused on the estimation of traffic correlations 
using passive sampling. In large multi-provider networks like 
the Internet, service providers often do not provide such 
network traces, e.g., for reasons of competition. The estimation 
of traffic correlations, therefore, must rely on inferring samples 
of the Internet traffic from network metrics that can be easily 
observed at end systems, e.g., by active probes. Moreover, 
passive sampling is a priori limited to single links. In case of 
network paths, where the correlations of the end-to-end service 
involve multiple nodes and links, end-to-end measurements 
may be the only viable option. We present an active probing 
method that enables users to characterize end-to-end paths, 
with minimal effort and without administrative support from 
the network under observation. 

In this section, we address the fundamental problem of 
inferring the correlation of LRD traffic using active probes. 
We propose a new active probing method which collects traffic 
samples by detecting router busy periods. The observations 
are used to estimate the covariance of the end-to-end service. 
Subsequently, we estimate the corresponding Hurst parame- 
ter. Furthermore, we show that the well known packet pair 
dispersions approach, which captures the traffic intensity at 
the ingress of a router, is also applicable for the derivation 
of LRD traffic correlations. In the sequel, we describe our 
probing methodology and discuss traffic correlation estimation 
for both the single and multi-node cases. We then show testbed 
measurements to demonstrate the feasibility of our method. 
Finally, we present a set of Internet measurement results 
showing end-to-end correlations of entire network paths. 

A. Probing methodology 

We investigate two probing methods that facilitate the 
inference of certain characteristics of network traffic, referred 
to as cross traffic. The two methods differ with respect to the 
probes, i.e., single packets and packet pairs, and the observed 
metric, i.e., delay and packet pair dispersion, respectively. 

1 ) Single packet probes: To extract an estimate of the cross 
traffic autocovariance, we propose an approach which uses 
the delays of single packet probes to detect busy periods at 
a router, and hence samples the link utilization at the router 
egress. For the remainder of this work, cross traffic denotes 
any traffic sharing resources with the probing traffic. 



We make the general assumption that packet scheduling is 
non-preemptive. Hence, whenever a router is busy transmitting 
a packet, the delay d p experienced by an arriving packet will 
be greater than the minimal delay d m i n experienced when 
the router is idle. Consequently, we can sample cross traffic 
increments at the router egress, by injecting probe packets and 
analyzing their delays. For each probe, we measure the one 
way delay d p = t r — t s , using the send and receive times 
t s and t r , respectively. To determine if the router was busy, 
we check whether the observed delay d p is greater than the 
minimum network delay d m i n . As a result, each probe yields a 
sample of the egress link state at time t = t s , and the observed 
process can be constructed as 



W{t) 



if dp ^> djnifi 

otherwise. 



and A(t) = 1 



(14) 



It is known ]TT| , p3) , that the covariance structure of LRD 
traffic is preserved at the output of a queue or a traffic shaper, 
such that W(t) permits observing the covariance of the cross 
traffic. We assume that the perturbation of the observed traffic 
due to probe size and probing rate is negligible, since the 
probing rates used are typically less than one per mill of the 
capacity. Furthermore, as we can assume that dropped probes 
are due to a busy router, we account for lost probing packets 
by setting W(t) = 1 for all dropped probes of A[t). 

2) Packet pair probes: In the following we present a second 
probing technique to estimate the cross traffic autocovariance 
that uses the dispersion of packet pair probes. Packet pair 
probing is a popular method for estimating the available 
bandwidth of a bottleneck link with FIFO queueing, e.g., |38|. 
A probe consists of two packets, each of size L, that are sent 
into the network back-to-back. The gap between the packet 
send times t St % and t S: 2 is set to g s = t s> 2—t Si i = L/C, where 
C is the capacity of the bottleneck link. The subscript denotes 
the packet location and number, e.g., t s< i denotes the time 
stamp of the first packet of the probe pair at the sender. The 
packet pair dispersion g r = t r 2 — t r i observed at a receiver 
yields a sample of the cross traffic intensity W(t) at t = t S) i 
as (381 



W(t) = C 



9r - 9s 

9s 



(15) 



We set W(t) = if no probe is sent at t as well as if probes 
are lost or incomplete at the receiver. 

We apply this method to infer cross traffic intensities by in- 
jecting packet pair probes at times t s and measuring the packet 
dispersion g r at the probe receiver. Note that the multiplicative 
constant g s /C in ( fT3] > does not alter the covariance structure. 
Hence, we can drop g s /C and estimate the covariance of the 
cross traffic process from W(t) = g r — g s if A{t) = 1 and zero 
otherwise without knowing the absolute value of the bottleneck 
link capacity. 

The packet pair method requires a sufficiently accurate time 
stamp resolution at the receiver to correctly capture the vari- 
ations in g r . As the first packet functions as a time reference, 
while the traffic intensity is sampled by the second packet, 
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Fig. 6: N node topology with probing traffic A pr and LRD 
cross traffic Af for nodes i € [1, N]. 



the approach does not require a synchronization between 
the sender and receiver clocks. However, this results in an 
overhead associated with each probe. As an example, when 
equally sized packets are used, 50% of the probing load is 
"wasted", thereby reducing the effective sampling resolution 
for a given probing rate by a factor of two. In addition, the 
extension of packet pair probing to the multi-node case is not 
trivial. Therefore, we proceed using the first method, i.e., by 
detecting router busy periods. 

B. Measuring LRD in single- and multi-node scenarios 

In this subsection we consider estimating traffic correlations 
in multi-node scenarios using the busy period detection tech- 
nique ( fT4| ). We now show that the observed process Wjv(i) 
at the egress of an N node path with LRD cross traffic also 
exhibits LRD behavior. Moreover, for cross traffic character- 
ized by different Hurst parameters, we show that the largest 
Hurst parameter dominates the covariance of the observed 
process Wjy(t). These results are in agreement with [ 1 3 1 



which shows that the largest Hurst parameter dominates end- 
to-end performance. 

Consider an N node topology with independent LRD cross 
traffic as in Fig. [6] We describe the busy state of each node 
using the processes Yi(t) for node i g [1, N], Hence, Yi(t) = 1 
if node i is busy at time t and Yi(t) = otherwise. Note that 



the covariance cy. (t) ~ r 



2Hi-2 



measured at the egress of 



node i has the same LRD property as the cross traffic input 
at the node fTT) , 1 13 1. Next, consider an active probe that is 
injected into the path. After subtracting the minimum end- 
to-end delay d m j n the observer at the egress of the path will 
measure a positive delay only if any of the routers was busy 
when the probe arrived at the respective router. Otherwise, 
the probe delay will equal zero. Hence, Wjv(t) is the logical 
OR operation of the individual processes Yi(t) for i g [1, N]. 
Since Yi{t) and Wi(t) £ {0,1}, we straightforwardly find 
Wi(t) at the egress of node i as 



Wi(t) = 



Yi(t) 
Wi. 



1 



(16) 



_ 1 (t)+Y i (t)-W i - 1 (t)Y i (t) , i e [2,N\. 

For ease of exposition, ( ff6l > assumes that a node that is idle 
forwards probe packets instantaneously to the next node, such 
that the probe packet observes Yi(t) at the same time instance 
t for all i 6 [1, AT]. Dispensing with this assumption, ( ff6l > can 
be formulated in the same way requiring, however, additional 
notation as a probe packet observes Y^(t) at t = ij where 
U > U-i f or * G [2,-/V]. 

First, we illustrate ( ff6| ) using a two node example and 
two independent LRD processes Yi(t), Yz(t). The observed 




Fig. 7: Experimental setup: Emulab testbed 



process at the egress of node 2 is W%{t) = 1 if Y\{t) = 1 OR 
Y-z (i) = 1 and W^i) = otherwise, such that we deduce 

W 2 (t) = Yj.it) + Y 2 (t) - Fi(t)y 2 (*). 

We derive the observed covariance cw 2 ( T ) of W2(t) after 
some algebra as 

cw 2 (r) =CYi (r)cy 2 {t)+c Yi (r)(l-^y 2 ) 2 +cy 2 (r)(l-^ yi ) 2 . 

The equation above directly shows that for large r the 
covariance ( r ) is dominated by cy (t) with the largest 
Hurst parameter, i.e., slowest decay. The covariance of the TV- 
node end-to-end observations cw N (t) is obtained using the 
recursion formula (fToTl as 



cw i (T)=c Wi _ 1 {T)c Yi (r) 

+ c Wi _ 1 (r)(l - M yJ 2 + cy(r)(l - m _J 2 . (17) 

Using recursive substitution, it can be shown that the covari- 
ance of the end-to-end observations cw N (i~) is dominated by 
cy (t) with the largest Hurst parameter for i£[l,N]. 

C. Probing Software and Experimental Setup 

We developed a probing tool H-probe, available at J5}, that 
performs online measurements to infer the covariance structure 
of the round trip service of network paths. H-probe injects 
ICMP echo request probes from the sender to the receiver 
and captures the associated round trip times using libpcap. 
In contrast to the (optional) client/server delay measurements 
using UDP, the use of ICMP probes is significantly more 
practical, as it circumvents clock synchronization issues and 
enables probing the path to any network host without the need 
for a receiver software. H-probe uses the method described in 



III-A 



V-A 



Sect. V-A and the statistical analysis discussed in Sect 
and optionally the H estimation technique from Sect. 
For H estimation using the aggregate variance method a range 
of scales has to be fixed for the estimation [4|. A similar ob- 
servation is made in the context of Hurst parameter estimation 
using wavelet decomposition [43 1. The authors in [43 1 describe 
upper and lower cutoff bounds on the time scales considered 
for the estimation using wavelets. Similarly, we fix the range 
of scales M for the H estimation with upper and lower 
cutoffs M up resp. M low . We define the upper range end M up 
depending on the sample size T, i.e., T/M up — 10 2 to ensure 
enough points for the variance calculation at M — M up . We 
fix M low = 0.1 s. This range is consistent with the ranges 
reported in trace driven H estimation literature, e.g., (15), [44|. 
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Fig. 8: Hurst parameter estimates from (a) offline trace anal- 
ysis, (b) offline trace sampling and (c) active probing in the 
Emulab testbed. 



In the following we present results obtained using this 
software package. Fig.^] depicts the experimental setup in our 
Emulab-based testbeqjThe topology comprises two relevant 
links, denoted link 1 and 2. Two traffic senders Si, i € [1,2] 
transmit LRD cross traffic traces with defined Hurst parameter 
H to the receivers Ri. The traces were synthesized by super- 
position of 10 5 heavy tailed on-off sources with tail index a. 
The relation between H and the tail index a is given in |45|. 
We set the mean rate of the traffic at each sender to 50 Mbps, 
with a constant packet size of 1500 Byte. 

We use geometrically distributed inter-sample times with 
p = 0.1 and slot length 5 = 1 ms. For each measurement we 
send 10 6 probes with a mean probing rate of 100 packets 
per second (corresponding to ~ 70 kbps) from the probe 
sender S p to the receiver R$. We use the same parameters 
for the Internet measurements, substituting R3 with PlanetLab 
nodes. To deal with non-queueing induced jitter in routers, 
H-probe substitutes d min in ( fl4| > by the average E[d]. This 
significantly reduces the measurement noise, because we can 
assume the distribution of this jitter is light tailed. We set the 
length of each measurement to 3 hours over which we assume 
stationarity of the traffic processes. 

D. Testbed measurements 

We deploy H-probe in our Emulab testbed, in order to 
verify its functionality in a controlled environment. First, we 
inject synthetic LRD traffic with H G [0.6,0.9] on link 1 
and collect 10 6 samples using our software. We repeat each 
experiment 25 times. We compare the covariance of the full 
traffic traces calculated offline (denoted trace) to the covari- 
ance extracted offline from a sampled process (denoted passive 

4 We use nodes with Supermicro X8DTU server mainboards with 2.2Ghz 
Intel E5520 Xeon processors, quad port Intel 82576EB Gigabit Ethernet 
Controllers, and Ubuntu 10.04 LTS with kernel 2.6.32-24, FIFO scheduling 
and buffers for 5000 packets. All links have a capacity of C = lGbps. 




(a) planetlabl.cis.upenn.edu (b) planetlab01.sys.virginia.edu (c) planetlabl.cs.umass.edu 

Fig. 9: End-to-end covariance estimates from Internet measurements. The covariance structure varies across different paths and 
for different times. For some targets we observe distinct periodicities on different time scales. 



TABLE II: Mean Hurst parameter estimates using packet pair 
probes. 





configured H 


0.6 


0.7 


0.8 


0.9 


estimated H 


0.64 


0.71 


0.79 


0.87 



sampling) as well as from probes using H-probe (denoted 
active probing). To this end, we estimate the Hurst parameter 
using a least square regression of the estimated covariance 
on lags r e [10°, 10 3 ]. The lag range for the regression as 
well as the probing process parameters are chosen according 
to the constraints in Sect. |III-B We show boxplots of the 



corresponding Hurst parameters in Fig. [8] It is evident that 
H-probe correctly estimates the configured Hurst parameters. 

We exemplary deploy the packet pair dispersion method 
described in Sect. |V-A2| in our Emulab testbed using the 
topology in FigjT] We measure the send and receive times of 
the probes using an Endace DAG packet capture card attached 
to network taps at the outgoing resp. incoming ports at S p 
respectively R3. The recorded time stamps t s ,t r 6 K have a 
7.5 ns capture precision. Tab. [ll|includes the mean of estimated 
H over 25 runs. The results indicate that capturing cross traffic 
intensities using packet pairs can be successfully used for 
Hurst parameter estimation. 

In another experiment we inject LRD traffic with differing 
H along links 1 and 2 denoted H\ and H2 respectively. In 



Tab. Ill we show exemplary Hurst parameters obtained for 
all combinations of H 1 = {0.6,0.9} and H 2 = {0.6,0.9} as 
obtained from single packet probes. We note that our method 
correctly characterizes the dominant correlations, respectively, 
H along end-to-end paths from a probing rate of as low as 
70 kbps. 

E. Internet measurements 

We perform measurements over multiple weeks using 
H-probe from our lab in Germany targeting a number of world- 
wide PlanetLab nodes, in order to estimate the correlations on 
end-to-end paths across the Internet. The complex correlation 



TABLE III: Exemplary Hurst parameter estimates in a 2 node 
scenario using single packet probes. 





estimated H on run # 


1 


2 


3 


4 


5 


{Hx =0.6, # 2 = 0.9} 


0.87 


0.89 


0.89 


0.90 


0.90 


{H x =0.9, # 2 = 0.6} 


0.87 


0.88 


0.88 


0.90 


0.90 


{Hi = 0.6,ff 2 = 0.6} 


0.59 


0.62 


0.64 


0.63 


0.63 


{H 1 = 0.9, H 2 = 0.9} 


0.92 


0.92 


0.89 


0.92 


0.89 



structure along exemplary Internet paths is illustrated by the 
covariance plots in Fig. [9] F irst, we observe LRD covariance 
decay depicted in Fig. 9(a) and 9(b) We point out that the 



correlation and hence the Hurst parameter vary significantly 
throughout the day. Moreover, we find that the correlation 
structure varies strongly across different paths. Additionally, 
for some targets we observed distinct periodicities on different 



timescales, as exemplified in Fig. 9(c) Periodic behavior 
in offline Internet traces due to various protocol implemen- 



tations has been previously reported, e.g., in [6|. Fig. 10 



depicts estimated H using the aggregate variance method from 



Sect. IV-A The Hurst parameter estimates indicate a diurnal 



behavior. We provide additional data sets and results in the 



appendix VIII 



H-probe provides a new tool enabling researchers to shed 
light on the complex structure of traffic correlations without 
requiring the availability of traffic traces from Internet service 
providers. 

VI. Conclusions 

In this paper, we derived estimators for the correlations of 
network traffic, given a limited set of traffic samples obtained 
by passive monitoring or active probing. We explored the 
impact of different sampling strategies on observed traffic 
correlations and quantified the impact of sampling on the 
observations. We showed that for finite sample sizes there 
are intrinsic limitations on the accuracy of the estimates and 
showed the influence of different sampling parameters. We 
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Fig. 10: Hurst parameter estimates from continues measure- 
ments over one week. Target is planetlabl.cis.upenn.edu. 
H estimates obtained from the aggregate variance method. 



found a non-linear tradeoff between sampling duration and 
sampling intensity. Further, we inferred the Hurst parameter 
H from covariance estimates to quantify LRD. We devel- 
oped and deployed an active probing method that estimates 
traffic correlations from end-to-end measurements without 
network support. The corresponding software is made publicly 
available. Finally, we presented measurement results from a 
controlled testbed environment as well as Internet paths. We 
observe a complex correlation structure on Internet paths. The 
correlation structure as well as H significantly vary across time 
and paths. In addition to LRD we observe periodic behavior 
at different time scales. 

VII. Appendix 

A. Autocorrelation of sampling point processes: 

We rephrase a basic result from |7| that is essential for the 
following derivations. Given a stationary stochastic process 
A{t) in continuous time r that takes values of either zero 
or one (Kronecker delta). The times between two Kronecker 
deltas are independent and identically distributed according to 
a density function /(r). The autocorrelation density of A(t) 
is known for lags t > as |7| Eq. (4.6.1) 



oo 

E [A(t)A(t + r)] = M a J2 * 

n=l 



(*n) 



(r) 



(18) 



where /^* n ^(r) is the n-fold self-convolution of /(r). The 
intuition behind ( fT8j ) is that starting from one Kronecker delta 
at A[t), another Kronecker delta at A(t+r) can be the first, the 
second , the third . . . , delta to come after the one at A(t). The 
derivation in jTj uses a small time interval of length At —> 0, 
such that MA^i is the probability that a Kronecker delta oc- 
curred in [t, t+At) Eq. (4.5.9). To calculate the correlations of 
A(t) [7] deduces the conditional probability that a Kronecker 
delta occurred at [t + T,t + r + At) given a Kronecker delta at 
[t,t+ At). This is given by J2n=i f { * n) ( T ) At E( i- (4.5.11). 



It follows that (At) 2 [i A J2™=i f { * n) ( T ) is the correlation of 
Kronecker deltas observed in time slots of length At. 

A discrete-time extension of the correlation calculation 
from [7 1 with probability mass functions is straightforward. 
To this end we replace the probability density functions by 
probability mass functions and consider a time slot At = 1 
such that we obtain correlation functions instead of densities. 

For the continuous time distributions considered in this work 
we regard the correlations on fixed time slot basis. We use a 
discretization with a time slot of unit size. 

B. Geometric sampling: 

Given a discrete time sampling process A(t) with inter- 
sample times drawn from a geometric distribution given in 
Tab. [i] with parameter p. The n-fold self-convolution of /(r) 
is the probability mass function (pmf) of the sum of n geo- 
metrically distributed random variables, i.e., negative binomial 
distributed with parameters p and n [14). We insert the pmf 
from (l4) for f ( * n) {r) into ((THJl to find 



E[A(t)A(t + r)} = ha 



n=l 

T 



T — 1 
71—1 

T — 1 



P n (i- P y- 



71—1 



P s (i- P y- s p 



i 

= HaY, 

s=0 
= MAP 

= A.- 

In the second line we used the support of the pmf y(* n ) (j) 
to bound 1 < n < r. In the third line we substituted r — 1 = 
I and n — 1 = s. In the fourth line we used the binomial 
identity 53 s =o ( l s ) xS y l s = ( x + vY ■ Finally, we know from 
the geometric distribution that /ia = P- 

C. Gamma sampling: 

Given the time between two Kronecker deltas is Gamma 
distributed as in Tab. [IJwith parameters a, (3. The sum in ( fT8j ) 
leads to the following expression 



( r ) 

n=l 



e -0r ~ (p T y 



(19) 



t ^— ' T(na) 

n—l x ' 

We derive analytical expressions for the cases a = 2 and 
a = 4 corresponding to Erlang(2) and Erlang(4) distributions 
of the inter-sample times. The mean rate of the sampling 
process is ma = ft /a. We substitute a = 2 into ([19) and 
evaluate the sum in ( fT9] > as 

3r) 2n 



V 



^ (2n- 1)! 

71=1 V ' 



fir sinh(/3r) 



using the series expansion for sinh from [1| Eq. (4.5.62). We 
then exploit the identity sinh(x) = (e x — e~ x )/2 and that 
Ma =0/2 to evaluate (p~8^> as 



E[A(t)A(t + r)]=t,\(l-e- 2 ^) 



For a = 4 we evaluate the sum in (1191) as 



03r 



^ (4/; - 1.1 

n=l v ; 



/3r 



sinh(/3r) — sin(/3r) 



using the series expansion for sinh and sin from (T| 
Eq. (4.5.62) resp. Eq. (4.3.65). We insert this result into ([18 
to find 



E [A(t)A(t + t)} = p\{l - e~ 2flT - 2 sin( ( 8r) ( 



using the identity sinh(a;) = (e x — e x )/2 and that pa = /?/4. 

D. Uniform sampling: 

Given the time between two Kxonecker deltas is uniformly 
distributed with /(r) = 1/6 for r e [0, 6] and zero otherwise. 
The mean rate of the sampling process is \ia = 2/6. In the 
following, we calculate the sum from ( fT8~| for r G [0,6]. We 
expand J2n=i in <QJJ as 

oo „ T 

^ /( *n) (T)=/(r)+ / /(xi)/(T _ Xl)dxi 
. i ^0 

l 

f(T-X 1 )f(xx-X2)f(x2)dx 1 dX2 

2 

f(T-xi)f(xi-X2)f(x 2 -X3)f(x 3 )dxidx 2 dx3 

(20) 
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Since all arguments (•) of the pdf / in ( |20| > are in the range 
[0, r] we can replace all pdfs /(•) in ( |20| > by 1/6. Equation 
( pO| l evaluates then to the series expansion of the exponential 
function, i.e., 



oo 1 

£/ ( * n) M = r 



r r r 
62 + 2P + 3!64 



1 OO 



(t/6)' 



6 * — ' n\ 



(21) 



Finally, we use ( f2T) , ( p~8| ), and that //^ = 2/6 to derive 

for re [0,6]. Since the process is mixing [3|, we conclude 
for r > 6 that the autocorrelation E [A(t)A(t + r)] converges 
quickly to p? A . 

E. Distribution of the autocovariance of geometrically sam- 
pled iid Gaussian sequences: 

Given a sample path w(t) of W(t), that is described by Q, 
where A(t) is a Bernoulli process and Y(t) is a Gaussian 
iid process with mean p Y and variance o\. The mean of 
the observations is known as nw = P-aP-y- The variance of 
W(t) is given by = o\p\ + OyP 2 A + a A a Y through 
independence of A(t) and Y(t). From the Bernoulli process 

A(t) we know <j\ + p? A = pa sucn that we can write 
&w = a \^Y + GyVA- We consider a limited sample size 



T such that w(t) given for t € An unbiased estimator 

of the autocovariance cwij) is 



c w {t) 



1 T_T 

T-r ^ 

t=l 



p w ) . 



After expansion of the product, for large T — r we apply the 
central limit theorem to approximate the individual terms by 
normal random variables to find 

~cw{r) « M (p, 2 w , °« +2 ^°« ) -Af(2p 2 w , 2 -4^) + p 2 w 

The confidence interval is directly obtained as 2 times the 
standard deviation, i.e., ±2 \J o%/ + ^p^a^ J \/(T — r). 
Finally, we insert = o\p Y + OyPA and 

assume T > t to find the confidence interval 

±2y/(a A p Y + a Y p A ) 2 + 4paP y (<t'aP y + <r' Y p A )/VT. 

F. Distribution of the autocovariance of the geometric sam- 
pling process: 

Given a sample path ait) of a Bernoulli sampling process 
A[t) with a limited sample size T such that A(t) is given for 
t G [1,T]. The Bernoulli sampling process has geometrically 
distributed inter-sample times as in Tab. [j] Given the mean pa 
is known. An unbiased estimator of the autocovariance Ca(t) 
is 



ca(t) 



1 



T-T 



T 



(«(*) - am) ( a + r ) - Ma) 



After expansion of the product, for large T-rwe apply the 
central limit theorem to approximate the individual terms by 
normal random variables to find 



< J A+ 2 t J -A< J j' 
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A J T-r J 
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The confidence interval is directly obtained as 2 times the 
standard deviation, i.e., ±2a a\/ &\ + ^Ma/'V^ — T - 

G. B/ai of the autocovariance estimator for Y(t): 

We derive the bias of the covariance estimator ( p2| i if applied 
to a sample path y(t) of the LRD process Y(t) and show that 
it is asymptotically unbiased as the sample duration tends to 
infinity T —> oo. Given a sample path y(t) with sample mean 



Tj=i y(t) and Aw 



MY"o — (T-r) 

covariance estimator is 



(T-r) 



cy(r) = 



^ {y(t)-p Yo ) (y(t + r)- py t ) (22) 



To estimate the bias, we derive the expected value E[cf(t)]. 
To this end, we expand the product and compute the expected 
values of the individual terms y(t)y(t + r) — y{t)p.Y T — 
y(t + t)py +fi Yo fj.Y T as 



1 



T-r 
T-T ^ 

t=l 



w(*)v(t- 



= c r (r) +p Y , 



where cy (r) and /iy are the population parameters, and 
1 



T-T 

T-t E 

t=i 



2/(*)£r T 



where we used that Y^t=i vif) = 



fiy . The same 
argument applies for the product y(t + r)/iy . We estimate 

E [AVo An ] = COV [fly , fry T ] + E [p,y ] E [/2 y T ] 

< Var[Ar ] + My> 

with E [/iy ] = /iy, respectively, E[/iy T ] = /iy that are 
unbiased estimators of the population mean. Note that the 
samples that form fj,y and Jiy r overlap by T — It, such that 
for r <C T we have Cov [fly , jjy T ] w Var[/iy ]. Finally, we 
use that the variance of the mean of T — r samples of an LRD 
process decays as a Y /(T — t) 2 ~ 2H to derive 

t |/*y /*VrJ ~ _ T )2-2ff 

Putting all pieces together we obtain 



2 

My 



E[gy(r)] = C Y {t) -Cov[p,y ,flY T ] 

„2 

~ cy(r) 



(23) 
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(T-T) 

i.e., the estimator underestimates the covariance, where the 
bias diminishes if T — r is large. We note, that the bias cannot 
be easily eliminated if the prefactor 1/(T — r — 1) is used 
instead of 1/ (T — r) in ( |22] i, as it is typically done if the 
covariance is estimated using the sample mean. 

H. Bias of the autocovariance estimator for W(t): 

We derive the bias of the covariance estimator ( p2) if applied 
to the observed process W(t). We show that the estimator is 
asymptotically unbiased for large sample durations T — > oo. 
Given a sample path w(t) with sample mean p,w > respectively, 
jjj\Y T defined in Sect. |VII-G We use the covariance estimator 
from ( p2| . To estimate the bias we derive E[cyp(r)] from ( |23"j ) 
as 

E[cVM] = cw(t) - Cov^Woj/VJ) 

where cw(r) is the population parameter. As before we 
estimate Cov[p, Wo , j2w T ] ~ Var[/2yp ] and express \xyv as a 
sum to compute the variance Var[/iw ] as 



Var[/x Wo ] = 



1 



T-T T-T 



(T-T) 2 ^ ^ 
v ' i=l j=l 



Cov[w(i),w(j)], 



where we use the identity 



Var 



(24) 



= EE Cov [ x -^] 

i=l 3=1 

for random variables Xj, i € [1, u]. Rearranging the statement 
above yields 

T-t-1 



Var \fi Wo ] = 



c w (0) 
T-t 



2 

JT^T) 



(T-T-t)c w (t), 



where we used the notation Cov[w(t),w(t + r)] = Cw(t). 
The expected value of the sample covariance follows as 

Cwr(O) 



E[c w (r)] « cw(r) 
2 



(T-ry 



T-t 

T-t-1 

■ £ (T- T -t)c w (i). (25) 
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/. Aggregate variance of a sampled process: 

Proof of Lem. [2j The aggregated version of the process 
W(t) on the aggregation level M is defined for k £ N as 

fcAf 

^ (M) (fe) = i E w (*)< 



t=l+(fc-l)M 

where M is the block size that is used for averaging. The 
variance of W^ M ^ is obtained using the identity ( p4| as 

M M 

Var(wW) = -^^Cov(W(i)^(i)). 

t=l 3=1 

Using the notation Cvf(t) = Cav(W(i), W(i + t)) we rear- 
range the previous statement as 

M-l 

Var (w (M A 



M 



AP 



J2(M-t)cw(t). (26) 



The same expression can be formulated for Var(A( M )) and 
Var(yi M )) \yy substituting c a (t) resp. cy(r) for Cw(t) in 
<[26j>. Next, we insert Cw(t) from Lem. [Tjinto ( p6| ) to relate 

Var(iy( M )) to Var(A( M )), Var(y( M )), c a {t) and cy(r). We 
obtain 



Var 



(V (M) ) - ~cy(0)(c A (0) + + ~ Cj4 (0K 



M-l 
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After some reordering we arrive at 
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and by application of po) we obtain 

Var = My Var (^ M >) + ^Var (V " ' 



1 2 M_1 

— cy (0)ca (0) + w ( M - t)c y (t)c a (t). 
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VIII. Data sets from Internet measurements 

We perform measurements over multiple weeks using 
H-probe from our lab in Germany targeting a number of world- 
wide PlanetLab nodes. Next, we describe the measurement 
setup: 

• Discretization slot length 5=1 ms. 

• Geometrically distributed inter-sample times with 
p = 0.1. 

• Number of probes collected 10 6 (~ 3 hours) 

• ICMP probing packets of size 64 Byte 

. Probing rate 100 pkt/s ~ 70 kbps (24 Byte layer 2 
overhead) 



In the following we present a representative set of the mea- 
surement results, where the target is planetlabl.cis.upenn.edu. 
We show exemplary end-to-end covariance as well aggregate 
variance estimates at two different days. In addition, we show 
estimates of measurements starting at 10:45 UTC from 17- 
24.7.2012. 

For the autocovariance as well as the aggregate variance 
method the slope of the curve is given by 2H — 2. Slope 
estimates are obtained through least square regression. The 
H estimates from Internet measurements have a moderately 
higher variance compared to active probing results from Fig. [8] 
Further, the H estimates in Fig. 10 show diurnal behavior. 
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Fig. 11: Covariance estimates 20.7.2012 UTC 



Fig. 14: Aggregate variance estimates 20.7.2012 UTC 
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Fig. 12: Covariance estimates 20.7.2012 UTC 



Fig. 15: Aggregate variance estimates 20.7.2012 UTC 




01:45 30.07.2012 
■04:45 30.07.2012 
■07:45 30.07.2012 
■ 10:45 30.07.2012 



-0.5 







0.5 

lo g 10 ( T [ sec l ) 



1.5 



w o -2 











01:45 30.07.2012 
04:45 30.07.2012 

07:45 30.07.2012 
10:45 30.07.2012 





0.5 



1.5 



log 1Q (M [sec] ) 



Fig. 13: Covariance estimates 30.7.2012 UTC 



Fig. 16: Aggregate variance estimates 30.7.2012 UTC 
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Fig. 17: Covariance estimates 30.7.2012 UTC 



Fig. 20: Aggregate variance estimates 30.7.2012 UTC 
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Fig. 18: Covariance estimates at 10:45 UTC 



Fig. 21: Aggregate variance estimates at 10:45 UTC 
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Fig. 19: Covariance estimates at 10:45 UTC 



Fig. 22: Aggregate variance estimates at 10:45 UTC 
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